Grid Capacity Pipeline

Aug 2025 ~ Flo Energy

Length:   1mo (at 0.75 FTE)

Programming languages:
   - Python (requests, base64, BytesIO, pdfplumber, Pandas, datetime, pytz, re, Dagster)
   - SQL (dbt)

Data:   Daily byte-endcoded PDF reports from Electricity Market Company (EMC) containing a table about planned and unplanned outages, alongside grid capacity tables tracking supply and demand metrics at 30-minute intervals for the next two weeks

Report Composition

Problem description:
Automate a pipeline that fetches a byte-endcoded PDF report via API, parses multiple tables, loads them into Snowflake via Amazon S3, and refreshes dbt models powering a Holistics dashboard

Approach & Results:
The architecture of the pipeline is shown in the diagram below. A Dagster job with daily asset partitions orchestrates the workflow, starting with a configurable Dagster resource for the EMC API. This resource requests the byte-encoded PDF report via OAuth using a JWT token. Next, the raw PDF is uploaded to Amazon S3 for archival, while at the same time the 15-page file is parsed with pdfplumber to extract multiple tables.

DISCLAIMER: Asset names have been modified to be more generic to comply with company confidentiality requirements.

Grid Capacity Pipeline Diagram

The three tables processed from the report are the 14-day grid capacity data at 30-minute intervals, planned outages of equipment, and unplanned outages. In parallel, each table is converted to a dataframe, with new features derived such as dates extracted from text columns using regular expressions. Enforced checks ensure correct processing before the dataframes are exported as JSON to Amazon S3 and loaded into raw Snowflake tables. These tables feed downstream dbt models which refresh the Holistics dashboard powering grid analytics.

DISCLAIMER: Numerical values in the Holistics charts have been redacted to comply with company confidentiality requirements.

Grid Capacity Dashboard

A Dagster retry policy was implemented to handle API timeouts and parsing errors with up to two retries with one-minute delays between runs. The pipeline executes daily through Dagster's built-in scheduler with no manual intervention, delivering reliable grid capacity data needed for energy trading.

  • Address

    Amsterdam, the Netherlands